NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor

https://doi.org/10.1109/CVPR52733.2024.00822

Goel, Vidit; Peruzzo, Elia; Jiang, Yifan; Xu, Dejia; Xu, Xingqian; Sebe, Nicu; Darrell, Trevor; Wang, Zhangyang; Shi, Humphrey (June 2024, IEEE)

Full Text Available
Aligning Large Multimodal Models with Factually Augmented RLHF

Sun, Zhiqing; Shen, Sheng; Cao, Shengcao; Liu, Haotian; Li, Chunyuan; Shen, Yikang; Gan, Chuang; Gui, Liangyan; Wang, Yu-Xiong; Yang, Yiming; et al (August 2024, Findings of the Association for Computational Linguistics (ACL Findings))

Full Text Available
CLAIR: Evaluating Image Captions with Large Language Models

https://doi.org/10.18653/v1/2023.emnlp-main.841

Chan, David; Petryk, Suzanne; Gonzalez, Joseph; Darrell, Trevor; Canny, John (December 2023, Association for Computational Linguistics)

Full Text Available
Simple Token-Level Confidence Improves Caption Correctness

https://doi.org/10.1109/WACV57701.2024.00564

Petryk, Suzanne; Whitehead, Spencer; Gonzalez, Joseph E; Darrell, Trevor; Rohrbach, Anna; Rohrbach, Marcus (January 2024, IEEE)

Full Text Available
CLAIR: Evaluating Image Captions with Large Language Models.

Chan, David; Petryk, Suzanne; Gonzalez, Joseph E; Darrell, Trevor; Canny, John F (October 2023, arXiv)

Full Text Available
Simple Token-Level Confidence Improves Caption Correctness

Petryk, Suzanne; Whitehead, Spencer; Gonzalez, Joseph_E; Darrell, Trevor; Rohrbach, Anna; Rohrbach, Marcus (May 2023, arXiv)

Full Text Available
Using Language to Extend to Unseen Domains.

Dunlap, Lisa; Mohri, Clara; Guillory, Devin; Zhang, Han; Darrell, Trevor; Gonzalez, Joseph E; Raghunathan, Aditi; Rohrbach, Anna (May 2023, International Conference on Learning Representations (ICLR))

Full Text Available
Guiding Pretraining in Reinforcement Learning with Large Language Models

Du, Yuqing; Watkins, Olivia; Wang, Zihan; Colas, Cédric; Darrell, Trevor; Abbeel, Pieter; Gupta, Abhishek; Andreas, Jacob (January 2023, International Conference on Machine Learning)

Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped reward function. Intrinsically motivated exploration methods address this limitation by rewarding agents for visiting novel states or transitions, but these methods offer limited benefits in large environments where most discovered novelty is irrelevant for downstream tasks. We describe a method that uses background knowledge from text corpora to shape exploration. This method, called ELLM (Exploring with LLMs) rewards an agent for achieving goals suggested by a language model prompted with a description of the agent’s current state. By leveraging large-scale language model pretraining, ELLM guides agents toward human-meaningful and plausibly useful behaviors without requiring a human in the loop. We evaluate ELLM in the Crafter game environment and the Housekeep robotic simulator, showing that ELLM-trained agents have better coverage of common-sense behaviors during pretraining and usually match or improve performance on a range of downstream tasks.
more » « less
Full Text Available
Describing differences in image sets with natural language

Dunlap, Lisa; Zhang, Yuhui; Wang, Xiaohan; Zhong, Ruiqi; Darrell, Trevor; Steinhardt, Jacob; Gonzalez, Joseph E; Yeung-Levy, Serena (January 2023, CVPR 2024)

Full Text Available
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension

https://doi.org/10.18653/v1/2022.acl-long.357

Subramanian, Sanjay; Merrill, William; Darrell, Trevor; Gardner, Matt; Singh, Sameer; Rohrbach, Anna (January 2022, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))

Training a referring expression comprehension (ReC) model for a new visual domain requires collecting referring expressions, and potentially corresponding bounding boxes, for images in the domain. While large-scale pre-trained models are useful for image classification across domains, it remains unclear if they can be applied in a zero-shot manner to more complex tasks like ReC. We present ReCLIP, a simple but strong zero-shot baseline that repurposes CLIP, a state-of-the-art large-scale model, for ReC. Motivated by the close connection between ReC and CLIP’s contrastive pre-training objective, the first component of ReCLIP is a region-scoring method that isolates object proposals via cropping and blurring, and passes them to CLIP. However, through controlled experiments on a synthetic dataset, we find that CLIP is largely incapable of performing spatial reasoning off-the-shelf. We reduce the gap between zero-shot baselines from prior work and supervised models by as much as 29% on RefCOCOg, and on RefGTA (video game imagery), ReCLIP’s relative improvement over supervised ReC models trained on real images is 8%.
more » « less
Full Text Available

« Prev Next »

Search for: All records